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Abstract. We examine normal form solutions of decision trees under typical choice functions 
induced by lower previsions. For large trees, finding such solutions is hard as very many strate- 
gies must be considered. In an earlier paper, we extended backward induction to arbitrary 
choice functions, yielding far more efficient solutions, and we identified simple necessary and 
sufficient conditions for this to work. In this paper, we show that backward induction works 
for maximality and E- admissibility, but not for interval dominance and F-maximin. We also 
show that, in some situations, a computationally cheap approximation of a choice function 
can be used, even if the approximation violates the conditions for backward induction; for 
instance, interval dominance with backward induction will yield at least all maximal normal 
form solutions. 



1. Introduction 

In classical decision theory, one aims to maximize expected utility. Such approach requires 
probabilities for all relevant events. However, when information and knowledge are limited, sadly, 
the decision maker may not be able to specify or elicit probabilities exactly. To handle this, 
various theories have been suggested, including lower previsions |35| . which essentially amount 
to sets of probabilities. 

In non-sequential problems, given a lower prevision, various generalizations of maximizing 
expected utility exist [31]. Sequential extensions of some of these alternatives have been sug- 
gested [13 m HSl HI Eni M HH En, yet not systematically studied. In this paper, we study, 
systematically, using lower previsions, which decision criteria admit efficient solutions to sequen- 
tial decision problems, by backward induction, even if probabilities are not exactly known. Our 
main contribution is that we prove for which criteria backward induction coincides with the usual 
normal form. 

We study very general sequential decision problems: a subject can choose from a set of 
options, where each option has uncertain consequences, leading to either rewards or more options. 
Based on her beliefs and preferences, the subject seeks an optimal strategy. Such problems are 
represented by a decision tree [551 HHl 13] • 

When maximizing expected utility, one can solve a decision tree by the usual normal form 
method, or by backward induction. First, note that the subject can specify, in advance, her actions 
in all eventualities. In the normal form, she simply chooses a specification which maximizes her 
expected utility. However, in larger problems, the number of specifications is gargantuan, and 
the normal form is not feasible. 

Fortunately, backward induction is far more efficient. We find the expected utility at the 
final decision nodes, and then replace these nodes with the maximum expected utility. The 
previously penultimate decision nodes are now ultimate, and the process repeats until the root is 



Key words and phrases, backward induction; decision tree; lower prevision; sequential decision making; choice 
function; maximality; E-admissibility; interval dominance; maximin; imprecise probability. 



2 



NATHAN HUNTLEY AND MATTHIAS C. M. TROFFAES 



reached. Backward induction is guaranteed to coincide with the normal form '25' if probabihties 
are non-zero 8, p. 44]. 

The usual normal form method works easily with decision criteria for lower previsions: apply 
it to the set of all strategies. Generalizing backward induction is harder, as no single expectation 
summarizes all relevant information about substrategies, unlike with expected utility. We follow 
Kikuti et al. |14| . and instead replace nodes with sets of optimal substrategies, moving from 
right to left in the tree, eliminating strategies as we go. De Cooman and Troffaes [5] presented 
a similar idea for dynamic programming. 

In this general setting, normal form and backward induction can differ, as noted by many 
[201 ISni im ISl [131 [I]- However, for some decision criteria the methods always coincide. 
In we found conditions for coincidence. In this paper, we expand the work begun in [12J, 
and investigate what works for lower previsions, finding that maximality and E-admissibility 
work, but the others do not. 

This coincidence is of interest for at least two reasons. First, as mentioned, the normal form is 
not feasible for larger trees, whereas backward induction can eliminate many strategies early on, 
hence being far more efficient. Secondly, one might argue that a solution where the two methods 
differ is philosophically flawed [m [51 [Ml HI] ■ 

The paper is organized as follows. Section [5] explains decision trees and introduces notation. 
Section [3] presents lower previsions and their decision criteria, and demonstrates normal form 
backward induction on a simple example. Section [4] formally defines the two methods, and 
characterizes their equivalence, which is applied in Section [S] to lower previsions. Section [51 
discusses a larger example. Section [3 concludes. Readers familiar with decision trees and lower 
previsions can start with Sections 13.31 and 151 

2. Decision Trees 

2.1. Definition and Example. Informally, a decision tree [TH1[3] is a graphical causal represen- 
tation of decisions, events, and rewards. Decision trees consist of a rooted tree p. 92, Sec. 3.2] 
of decision nodes, chance nodes, and reward leaves, growing from left to right. The left hand 
side corresponds to what happens first, and the right side to what happens last. 

Consider the following example. Tomorrow, a subject is going for a walk in the lake district. 
It may rain or not (-^2)- The subject can either take a waterproof (c?i), or not (^2)- But the 
subject may also choose to buy today's newspaper, at cost c, to learn about tomorrow's weather 
forecast {ds), or not (c?g), before leaving for the lake district. The forecast has two possible 
outcomes: predicting rain (5*1), or not (52). 

The corresponding decision tree is depicted in Figure [H Decision nodes are depicted by 
squares, and chance nodes by circles. From each node, a number of branches emerge, representing 
decisions at decision nodes and events at chance nodes. The events from a node form a partition 
of the possibility space: exactly one of the events will take place. Each path in a decision tree 
corresponds to a sequence of decisions and events. The reward from each such sequence appears 
at the right hand end of the branch. 

2.2. Notation. A particular decision tree can be seen as a combination of smaller decision 
trees: for example, one could draw the subtree corresponding to buying the newspaper, and also 
draw the subtree corresponding to making an immediate decision. The decision tree for the full 
problem is then formed by joining these two subtrees at a decision node. 

So, we can represent a decision tree by its subtrees and the type of its root node. Let Ti, . . . , 
Tn be decision trees. If T combines the trees at a decision node, we write 

n 

t=[_\t,. 

i=l 
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10 -c 
15 - c 

5-c 
20 -c 



10 -c 
15 - c 

5-c 
20 -c 



Figure 1. A decision tree for walking in the lake district. 

If T combines the trees at a chance node, with subtree Ti being connected by event (Ei, . . . , 
E^ is a partition of the possibility space) we write 

n 

t^Qe.t,. 

i=l 

For instance, for the tree of Fig. [T] with c = 1, we write 

(5i(ri u T2) 52(ri u T2)) u (c/i u U2) 

with, where we denoted the reward nodes by their utility, 

Ti^Ei9QE2U C/i = £;ilO0i;2l5 

T2^Ei4:QE2l9 U2=Ei5QE220 

The above notation shall prove very useful when considering recursive definitions. 

In this paper we often consider subtrees of larger trees. For subtrees, it is important to know 
the events that were observed in the past. Two subtrees with the same configuration of nodes 
and arcs may have different preceding events, and should be treated differently. Therefore we 
associate with every decision tree T an event ev(T) representing the intersection of all the events 
on chance arcs that have preceded T. 

Definition 1. A subtree of a tree T obtained by removal of all non- descendants of a particular 
node N is called the subtree of T at and is denoted by stjv(T). 

These subtrees are called 'continuation trees' by Hammond [S]. 
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Consider all possible ways that sets of decision trees 7i , . . . , 7^ can be combined. Our notation 
easily extends. For any partition Ei, . . . , En, 



For convenience we only work with decision trees for which there is no event arc that is 
impossible given preceding events. 

Definition 2. A decision tree T is called consistent if for every node N of T , 



Clearly, if a decision tree T is consistent, then for any node N in T, stjv(T) is also consistent. 
Considering only consistent trees is not really a restriction, since inconsistent trees would only 
be drawn due to an oversight and could easily be made consistent. 

2.3. Solving Decision Trees with Probabilities and Utilities. We give a brief overview of 
the standard method of solving a decision tree when probabilities of events are known. Suppose 
in Fig. m we have p{Si) = 0.6, p{Ei\Si) = 0.7, and p{Ei\S2) = 0.2, so p{Ei) = 0.5. We first 
calculate the expected utility of the final chance nodes. For example, the expected utility at 
A^iJi is 0.7(10 - c) + 0.3(15 - c) = 11.5 - c, and the expected utility at N{[^ is 9.5 - c. 

We now see that at N-^\ it is better to choose decision di. We then replace Ni\ and its subtree 
with the expected utility of N^\^: 11.5 — c. Also follow this procedure for N-^\ and iVi2, and 
the tree has been reduced by a stage. We find that d2 is optimal at N ^\ with value 17 — c, and 
at both decisions are optimal with value 12.5. 

Next, take expected utility at N^^, which is 0.6(11.5 - c) + 0.4(17 - c) = 13.7 - c. At N^, we 
therefore take decision if c < 1.2 and if c > 1.2. This procedure is illustrated in Fig. [51 
where the dashed lines indicate decision arcs that are rejected because their expected utility is 
too low (for any specific c ^ 1.2, either ds or d-^ would be dashed). 

This method can only be carried out if the subject has assessed precise probabilities and 
utilities and wishes to maximize expected utility. It may be that the subject is unable or un- 
willing to comply with these requirements. The next section considers a possible solution, and 
demonstrates how backward induction can be generalized. 



First, we outline a straightforward generalization of the theory of probability, allowing the 
subject to model uncertainty in cases where too little information is available to identify a 
unique probability distribution (see for instance pi 1^ IMl 1771 155] 1 

3.1. Gambles, Credal Sets and Lower Previsions. The possibility space ft is the set of all 

possible states of the world. Elements of il are denoted by w. Subsets of f2 are called events, 
and are denoted by capital letters A, B, etc. The arcs emerging from chance nodes in a decision 
tree correspond to events. 

A gamble is a function X : — ^ M, interpreted as an uncertain reward: should w £ O be the 
true state of the world, the gamble X will yield the reward X{uj). 

A probability mass function is a non- negative real- valued function p : 57 — >■ R+ whose values 
sum to one [351 P- 138, Sec. 4.2]. For convenience, we will write p{A) for J2ujeAPi^)- 




For any sets of consistent decision trees Ti, ■ ■ ■ , Tn 




ev(stjv(T)) ^ 0. 



3. Credal Sets and Lower Previsions 
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Figure 2. Solving the Lake District problem with expected utility. 



For the purpose of this paper, we assume that 

• is finite, 

• rewards are expressed in utiles, 

• the subject can express her beliefs by means of a closed convex set Ai of probability 
mass functions p {M is called the credal set), and 

• each probability mass function p ^ A4 satisfies p{uj) > 0, for all e f2. 

Under the above assumptions, each p in Ai determines a conditional expectation 



Ep{X\A) 



piA) 



and the whole set A4 determines a conditional lower and upper expectation 



PiX\A) = min Ep{X\A) 



PiX\A) = max£;„(X|A), 



and this for every gamble X and every non-empty event A. 

The functional P is called a coherent conditional lower prevision, and similarly, P is called 
a coherent conditional upper prevision. Although here we have defined these by means of a set 
of probability measures, there are different ways of obtaining and interpreting lower and upper 
previsions (see for instance Miranda |22j for a survey). 

Following De Finetti [6], where it is convenient we denote the indicator gamble 



1 if Lj e A 
if w ^ A 
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of an event A also simply by A: for instance, if X is a gamble, and A is an event, then AX is 
just a shorthand notation for IaX. 

Below are some properties of coherent conditional lower and upper previsions that we require 
later (see Williams [36l[37] or Walley [35 for proofs). 

Proposition 3. For all non-empty events A,B, gambles X,Y, and constants X > 0: 

(i) If AX = AY then P{X\A) = P{Y\A). _ 

(ii) PiX\A) + P{Y\A) <P{X + Y\A) < PiX\A) +~P{Y\A). 
(Hi) P{\X\A) = \E.{X\A) andP{\X\A) = XP{X\A). 

(iv) P{X\A) = -P(-X|A). 

(v) P(yl(X -P(X|ylnS))|S) = 0. 



Property (v) generalizes the generalized Bayes rule, which describes how conditional lower 



previsions are linked to unconditional lower previsions. For instance, if P.{A\B) > 0, then 



P(-|A n B) is uniquely determined by P(-|P) via property (v) [35J p. 297, Thm. 6.4.1 



3.2. Choice Functions and Optimality. Suppose a subject must choose a gamble from a 
set X. In classical decision theory, a gamble is optimal if its expectation is maximal. More 
generally, given a credal set A4, the subject might consider as optimal for example any gambles 
whose expectation is maximal for at least one p E M: in fact, most criteria determine optimal 
decisions by comparison of gambles. So, we suppose that the subject has a way of determining 
an optimal subset of any set of gambles, conditional on any event A (think of A as ev(r)): 

Definition 4. A choice function opt is an operator that, for each non-empty event A, maps each 
non-empty finite set X of gambles to a non-empty subset of this set: 

^ opt{X\A) C X. 

Note that common uses of choice functions in social choice theory, such as by Sen '32, p. 63, 
11. 19-21], do not consider conditioning on events, and define choice functions for arbitrary sets 
of options, rather than for gambles only. 

The interpretation of a choice function is that when the subject can choose among the elements 
of X, having observed A, she would only choose from opt(A'|A). Therefore, we say the elements 
of opt(A:'|^) are optimal (relative to X and A). Note that the subject may not consider them 
equivalent: adding a small incentive to choose a particular optimal option would not necessarily 
make it the single preferred option. 

We now consider four popular choice functions that have been proposed for choosing between 
gambles given a coherent lower prevision. Further discussion of the criteria presented here can 
be found in Troffaes [34j. 

3.2.1. Maximality. Maximality is based on the following strict partial preference order >p\a- 

Definition 5. Given a coherent lower prevision P_, for any two gambles X and Y we write 
X >p\A y whenever P{X - Y\A) > 0. 

The partial order >p\a gives rise to the choice function maximality, proposed by Condorcet 
[H pp. Ivj-bcix, 4.*^ Exemple], Sen [35], and Walley [3S], among othersQ 

Definition 6. For any non-empty finite set of gambles X and each event A ^ 9, 

opt>p,(A'|A) ^{Xex-.iWe x){Yj^p,aX)}. 



^Because all probabilities in Ai are assumed to be strictly positive, Walley's admissibility condition is implied 
and hence omitted in Definition [6l 



NORMAL FORM BACKWARD INDUCTION FOR DECISION TREES WITH COH. LOWER PREVISIONS 7 



3.2.2. E- admissibility. Another criterion is E-admissibility, proposed by Levi [17) . Recall that 
P{-\A) is the lower envelope of A4. For each p e we can maximize expected utility: 

optj,{X\A) = {XeX:{yYG X){Ep{Y\A) < Ep{X\A))}. 

Then the set of E-admissible options is defined by: 

Definition 7. For any non-empty finite set of gambles X and each event A ^ ^, 

opt^(^iA) = y optp(A'iA). 

A gamble X is therefore E-admissible when it maximizes expected utility under at least one 
p G 7W. Any E-admissible gamble is maximal P- 162, 11. 26-28]. 

3.2.3. Interval Dominance. Interval dominance is based on the strict partial preference order 

^P\A- 

Definition 8. Given a coherent lower prevision P_, for any non-empty event A and any two 
A-consistent gambles X and Y we write X Zip\A Y whenever P_{X\A) > P{Y\A). 

This ordering induces a choice function usually called interval dominance [381 134] : 

Definition 9. For any non-empty finite set of gambles X and each event A ^ ^, 

opt^^i (A-IA) = {X e A': (vr e A')(y^p^x)}. 

The above criterion was apparently first introduced by Kyburg [15 and was originally called 
stochastic dominance. 

3.2.4. T -maximin. F-maximin selects gambles that maximizes the minimum expected reward. 
Definition 10. For any non-empty finite set of gambles X and each event Ay^ 

optp{X\A) ^{X eX: {VY e X){P{X\A) > P{Y\A))}. 

F-maximin is a total preorder, and so usually selects a single gamble regardless of the degree 
of uncertainty in P. F-maximin can be criticized for being too conservative (see Walley |351 
p. 164]), as it only takes into account the worst possible scenario. 

3.3. Sequential Problems Using Lower Previsions. Consider again the lake district prob- 
lem depicted in Fig.[T] but now suppose that the subject has specified a coherent lower prevision, 
instead of a singe probability measure. For this example, we consider an e-contamination model: 
with probability 1 — e, observations follow a given probability mass function p, and with proba- 
bility e, observations follow an unknown arbitrary distribution. One can easily check that, under 
this model, the lower expectation for a gamble X is 

P{X) = (1 - e)Ep{X) + eMX 
The conditional lower expectation is [35, p. 309] 

^ {l-e)Ep{AX) + emi^eAX{cj) 
' ' {l-e)piA)+e 

As before, let p(S'i) = 0.6, p{Ei\Si) = 0.7, and p{Ei\S2) = 0.2, so p{Ei) = 0.5. Let e = 0.1. 

Naively, she could solve the problem with the usual normal form method: she lists all possible 
strategies (actions to take in all eventualities), finds the corresponding gambles, and applies a 
suitable choice function, say maximality. 

Table[T]lists all strategies and their gambles. Each strategy gives a reward determined entirely 
by w, and hence has a corresponding gamble. For example, the gamble for the last strategy is 

(5 - c)SiEi + (20 - c)SiE2 + (10 - c)S2Ei + (15 - c)S2E2 = SiY + S2X - c, 



8 



NATHAN HUNTLEY AND MATTHIAS C. M. TROFFAES 



strategy 


gamble 


dg, then di 








X 


dg, then ^2 








Y 


ds, then di 


if Si 


and di 


if 52 


X-c 


ds, then d2 


if 5"! 


and (i2 


if 52 


Y -c 


ds, then di 


if Si 


and d2 


if 52 


SiX + S2Y - c 


ds, then ^2 


if S"! 


and di 


if 52 


SiY + 82X^0 



Table 1. Strategies and gambles for the lake district problem. 



{X - c} 




opt{{X - c,Y - c}\Si) = {X - c} 
optdX - c,Y - c}\S2) = {¥ - c} 



oC^opt({X,Y}) = {X,y} 




opt({Si(X - c) + 52(1- - c)}) = {SiX + S2Y - c} 
{X,Y} 



({SiX + S2Y-c} 
(iv) opt{{SiX + S2Y-c,X,Y})= {{X^Y} 



if c< 29/50 
if c> 79/50 
{SiX + S2Y - c, X, Y} otherwise 



Figure 3. Solving the lake district example by normal form backward induction. 



with X = lOEi + I5E2 and Y — 5Ei + 20i?2- RecaU that (5 — c)SiEi is just a shorthand notation 
for (5 — cjls-ilsn and similarly for all other terms. 

Maximality can then be applied to find the optimal gambles: this requires comparison of all 
six gambles at once. Skipping the details of this calculation, for instance with c = 0.5, we find 
that we should buy the newspaper and follow its advice. 

However, could we think of a backward induction scheme which might not require comparison 
of all six gambles at once? Obviously, any such scheme will not work as easily as in Section [2.31 
because we are not maximizing for a single number (i.e. expected utility). Instead, we retain the 
optimal strategies in subtrees, as illustrated in Fig. [3] 

Denote subtrees at a particular node N* by T* — stAr*(T). 

(i) First, write down the gambles at the final chance nodes. For example, at A^ii^ the 
gamble is (10 — c)Ei + (15 — c)E2 = X — c, and similarly for all others. 

(ii) Let us first deal with the branch corresponding to refusing the newspaper. At the decision 
node N^2, we have a choice between two strategies that correspond to the gambles X and 
Y. We also have ev(Tj2) = ^- So to determine the optimal strategies in this subtree. 
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we must compare these two gambles unconditionally: 

P{X -Y)= P{Y -X) = -5e = -1/2, 

so at the strategies di and d2 are both optimal. 

Now we move to the branch corresponding to buying the newspaper. At N i\, we need 
to compare X — c and Y ~ c. We have that ev(T]^J) = 5'i, and 

P((X-c)-(y-c)|5i) = ^>0, 

so X — c > p\Si Y — c and the uniquely optimal strategy is di. Next, considering N12, 
we see that ev(rj^2) — ^2, and 

P({Y - c) - {X - c)\S2) ^ ^ > 0, 

so the optimal strategy here is d2. 

(iii) Moving to N^^, we see that only one of the original four strategies remains: "di if Si 
and d2 if 52", corresponding to the gamble SiX + S2Y — c. 

(iv) Finally, considering the entire tree T, three strategies are left: "di if S'l and ^2 if 5*2"; 
"dg, then di"; "d-g, then 1^2". Therefore we need to find 



We have 



opty^^XiSiX + S2Y - c,X,Y}). 

P{X- {SiX + S2Y - c)) = c - (6 + 19e)/5 = c - 79/50 
P{{SiX + S2Y -c)-X) = (6-31e)/5 - c = 29/50- c 
P{Y - {SiX + S2Y - c)) = c - (6 + 19e)/5 = c - 79/50 
P{{SiX + S2Y - c) -Y) = (6-31e)/5-c = 29/50- c 

Concluding (see Fig. [ 31 iv)] ): 

• if the newspaper costs less than 29/50, we should buy and follow its advice. 

• if it costs more than 79/50, we do not buy, but have insufficient information to 
decide whether to take the waterproof or not. 

• if the newspaper costs between 29/50 and 79/50, we can take any of the three 
remaining options. 

Comparing this with the solution calculated in Section [^751 we observe that the imprecision 
has created a range of c for which it is unclear whether buying the newspaper is better than not, 
rather than the single value for c in the precise case. Despite this, should the subject decide to 
buy the newspaper, she will follow the same policy in both cases: take the waterproof only if the 
newspaper predicts rain. Finally it should be noted that, although in both cases both d-gdi and 
d-gd2 are involved in optimal normal form decisions for some values of c, in the precise case this 
is because they are equivalent and in the imprecise case they are incomparable. A tiny increase 
in value for, say, not taking the waterproof and no rain, would make d-gdi always non-optimal 
under Ep but still optimal under P for c > 29/50. 

For this particular example, it is easy though tedious to check that backward induction gives 
the same answer as the usual normal form method (that is, applying maximality to all six 
gambles), for any value of c. It is easy to find choice functions and decision trees where this 
does not work [161 ESI |30] . We want to know for which of the coherent lower previsions choice 
functions the two methods agree. To answer this, we invoke a theorem relating to general choice 
functions, outlined in the next section. 
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4. Normal Form Solutions for Decision Trees 

We now introduce the necessary terminology for examining our two methods of solution in 
detail, and provide theorems stating when they coincide. These two methods yield normal form 
solutions, that is, sets of optimal strategies at the root node. 

4.1. Normal Form Decisions, Solutions, Operators, and Gambles. Suppose the subject 
specifies a decision for each eventuality, and resolves to follow this policy. She now has no choice 
at decision nodes, and her reward is entirely determined by the state of nature. This corresponds 
to a reduced decision tree obtained by taking the initial tree and removing all but one of the 
arcs at each decision node. Such a reduced tree is called a normal form decision, and represents 
what we called a "strategy" in Section 13.31 We denote the set of all normal form decisions of T 
by nfd(r). 

It is unlikely that the subject can specify a single optimal normal form decision for all problems. 
Nevertheless, she might be able to eliminate some unacceptable ones: a normal form solution 
of a decision tree T is simply a non-empty subset of nfd(T). A normal form operator is then a 
function mapping every decision tree to a normal form solution of that tree. The two methods 
we investigate are normal form operators. 

As we saw in Section 13.31 the reward for a normal form decision is determined entirely by the 
events that take place. That is, a normal form decision has a corresponding gamble, which we 
call a normal form gamble. The set of all normal form gambles associated with a decision tree T 
is denoted by gamb(T), so gamb is an operator on trees that yields the set of all gambles induced 
by normal form decisions of the tree. 

We will need to know when a set of gambles can be represented by a consistent decision tree 
(as defined earlier in Section [ 



Definition 11. Let A be any non-empty event, and let X be a set of gambles. Then the following 
conditions are equivalent; if they are satisfied, we say that X is A-consistent. 

(A) There is a consistent decision tree T with ev{T) — A and gamb(T) = X. 

(B) For allr eR and all X e X such that X^^{r) ^ 0, it holds that X-'^ir) n A ^ 0. 



Proof of equivalence of (A) and (B) is fairly straightforward, whence omitted here. 



The following notation proves convenient for normal form gambles at chance nodes. 

Definition 12. For any events Ei, . . . , En which form a partition, and any finite family of sets 
of gambles Xi, . . . , Xn, we define the following set of gambles: 

n ( ^ 1 

(1) = i^i;,x,:X, GA-, i 

1=1 li=i J 

4.2. Normal Form Operator Induced by a Choice Function. We can now formalize the 
simple normal form method described at the start of Section 13.31 Listing all strategies corre- 
sponds to finding nfd(T). Listing their corresponding gambles corresponds to finding gamb(r). 
Then calculate opt(gamb(r)|ev(T)), and find all elements of nfd(T) that induced these optimal 
gambles. The solution is then the set of all these normal form decisions. Formally, 

Definition 13. Given any choice function opt, and any decision tree T with ev(T) ^ 0, 

normopt(r) = {[/ £ nfd(T) : gamb([/) C opt(gamb(T)|ev(r))}. 

The following important equality follows immediately: 

(2) gamb(normopt(r)) = opt(gamb(r)|ev(r)). 



NORMAL FORM BACKWARD INDUCTION FOR DECISION TREES WITH COH. LOWER PREVISIONS 11 

Let US demonstrate this definition on lake district problem. For any particular strategy C/, say 
for instance "buy the newspaper and take the waterproof only if the newspaper predicts rain" , 
we can calculate its associated gamble U , which is in our instance 

X = {10- c)EiSi ® (15 - c)E2Si © (5 - c)EiS2 © (20 - c)E2S2. 

We check whether X is optimal in the set of all gambles associated with T, that is, whether 
X e opt(gamb(T)|ev(r)). If so, then gamb([/) = {X} C opt(gamb(T)|ev(T)) and so C/ G 
normopt(r). Otherwise, gamb(C/) = {X] % opt(gamb(T)|ev(T)) and so U ^ normopt(r). This 
procedure for each strategy in T will determine normopt(r). 

4.3. Normal Form Backward Induction. The operator normopt is a natural and popular 
choice, but for practical or philosophical reasons one may wish to be able to find it by backward 
induction. Even for an almost trivial problem such as Fig. [T] there are already six normal form 
decisions. If a tree T has at least n decision nodes in every path from the root to any leaf, and 
each decision node has at least two children, then there will be at least 2" normal form decisions 
associated with T (and often a lot more). Working with sets of 2" gambles may be impractical 
for large n, particularly for maximality and E-admissibility, so a method that may avoid applying 
the choice function on a large set is necessary. 

Implementation of backward induction is easy when there is a unique choice at every node, 
but a choice function may not have this property, so we need to adapt the traditional approach. 
The technique informally introduced in Section 13.31 is a generalization of the method of Kikuti 
et al. jl4] . where the only difference is that we apply our choice function at all nodes, not just 
decision nodes. Although the focus of Kikuti et al. is also on uncertainty models represented by 
coherent lower previsions, their approach, and so our generalization, can be used for any choice 
function on gambles. 

The goal of our backward induction algorithm is to reach a normal form solution of T by 
finding normal form solutions of subtrees of T, and using these to remove some elements of 
nfd(T) before applying opt. A formal definition requires many definitions that hinder clarity, so 
in this paper we prefer a more intuitive informal approach. Rigorous treatment of our backward 
induction operator can be found in [101 111) . 

The algorithm moves from right to left in the tree as follows. At a subtree st7v(7'), find the 
set of normal form decisions nfd(T), but remove any strategies that contain substrategies judged 
non-optimal at any descendent node. For example, in Fig. [TJ at N ^ there are four normal form 
decisions, but in the example the substrategy d2 was removed at iVi}, and di was removed at 
Ni\, so the only strategy at N^^ that is retained is di if S'l, d2 if 5*2. Next, find the corresponding 
gambles of all surviving normal form decisions, apply opt, and transform back to optimal normal 
form decisions. Move to the next layer of nodes, and continue until the root node is reached. 
This yields a set of normal form decisions at the root node, that is, the algorithm corresponds 
to a normal form operator, which we call backopt. 

A further example using maximality, but this time using the notation of this section, can be 
found in Section |51 if further clarification is required. 

In [lOl [11] , we found four necessary and sufficient properties on opt for backopt and normopt 
to coincide for any consistent decision tree. 

Property 1 (Backward conditioning property). Let A and B he events such that AnB ^ 9 and 
Ar\ B ^ %, and let X be a non-empty finite A H B-consistent set of gambles, with {X, Y} C X 
such that AX = AY. Then X G opt(A'|A n B) implies Y E opt(A:'|^ n B) whenever there is a 
non-empty finite A Cl B-consistent set of gambles Z such that, for at least one Z £ Z, 

AX + AZ e opt{AX + AZ\B). 
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This property requires that, if two gambles agree on A, it is not possible for exactly one to be 
optimal, conditional on any subset of A, unless there is no suitable Z. 

Property 2 (Insensitivity of optimality to the omission of non-optimal elements). For any event 
A^^, and any non-empty finite A-consistent sets of gambles X and y, 



If opt satisfies this property, then removing non-optimal elements from a set does not affect 
whether or not each of the remaining elements is optimal. The property is called 'insensitivity 
to the omission of non-optimal elements' by De Cooman and Troffaes [3], and 'property e' by 
Sen [35j who attributes this designation to Douglas Blair. 

Property 3 (Preservation of non-optimality under the addition of elements). For any event 
A^^, and any non-empty finite A-consistent sets of gambles X and y, 

y^x^ opt(3;|yi) D QY>t{x\A) n y. 

This is called 'property a' by Sen [32], Axiom 7 by Luce and Raiffa [111 p. 288], and 'indepen- 
dence of irrelevant alternatives' by Radner and Marschak [21!0 It states that any gamble that 
is non-optimal in a set of gambles y is non-optimal in any set of gambles containing y. 

Properties [2] and [3] are together equivalent to the well-known property of path independence, 
which can often be checked more conveniently. 

Property 4. A choice function opt is path independent if, for any non-empty event A, and for 
any finite family of non-empty finite A-consistent sets of gambles Xi, . . . , X^, 



Path independence appears frequently in the social choice literature. Plott 23 gives a detailed 
investigation of path independence and its possible justifications. Path independence is also 
equivalent to Axiom 7' of Luce and Raiffa T^, p. 289]. 

Lemma 14 (Sen 32, Proposition 19]). A choice function opt satisfies Properties\^ and\^ if and 



Property 5 (Backward mixture property). For any events A and B such that B n A =/= and 
B n A =/: 9, any B D A-consistent gamble Z , and any non-empty finite B D A-consistent set of 
gambles X , 

opt {AX +^Z\B) A OY>i{X\ A r\B) + ^Z. 

Theorem 15 (Backward induction theorem). Let opt be any choice function. The following 
conditions are equivalent. 

(A) For any consistent decision tree T, it holds that backopt(r) = normopt(T'). 

(B) opt satisfies Properties\^\^\^ and\^ 



In this section we investigate which of the choice functions for coherent lower previsions satisfy 
the conditions of TheoremfT^B)] Some of the results are based on proofs for more general choice 
functions that can be found in the Appendix. 



This is different from several other properties bearing the same name, such as that of Arrow's Impossibility 
Theorem. For further discussion, see Ray | 27| . 



o^i[X\A) C 3; C A- ^ ov^{y\A) ^ o]it[X\A). 




only if opt satisfies Property^ 



5. Application to Coherent Lower Previsions 
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5.1. MELximality. Maximality is a strict partial order, so Properties [5] and |3] hold, by Proposi- 
tion lA.ll 

Proposition 16. Maximality satisfies Property]^ 

Proof. We prove a stronger result. Let A he a, non-empty event, X he a non-empty finite set of 
A-consistent gambles, and {X, y} C X with AX — AY. We show that, for any event B such 
that A n B ^ 0, X G opt>p|. {X\A n B) implies Y e opt>p|. n B). 

If X e opt>p| (A'lAnS), then for every Z &X, P{Z -X\Ar\B) < 0. But AX = AY implies 
{A n B)X = {AnB)Y, and, by Proposition!^ P{Z - X\A D B) = P{Z - Y\A n B), and so it 
immediately follows that Y G opt>p| (A'jA n B). □ 

Proposition 17. Maximality satisfies Property\^ 

Proof. Consider events A and B such that A n B ^ and A D B $, a. non-empty finite set 

of ^ n B-consistent gambles X, and an A n B-consistent gamble Z. To establish Property [SJ it 
suffices to show that for any Y £ X, 

r ^ opt>p|.(A'|AnB) ^ AY + AZ (^opty^^, (AX ®AZ\B). 

If r ^ opty^ {X\A n B) then there is an X e X with P{X - Y\A n B) > 0. The resuh follows 



if we show that P{AX + AZ - {AY + AZ)\B) > 0. By Proposition [ gi;u)f|(v) 

= P{A{X -Y - P{X - Y\A n B))\B) 
< P{A{X - Y)\B)+P{-AP{X - Y\A n B)\B) 
= P{A{X -Y)\B) - P{AP{X -Y\Ar\B)\B) 
= P{A{X - y)\B) - P{A\B)P{X - Y\A n B) 

where we relied on P{X — Y\An B) > in the last step. So, indeed 

P{A{X - Y)\B) > P{A\B)P{X - Y\A n B) > 0. 



□ 



Corollary 18. For any consistent decision tree T , it holds that 

backopt>p| (T) = normopt>p| (T). 
Proof. Immediate, from Propositions lA.i l [TOl and[T71 and Theorem [131 D 

5.2. E-admissibility. Since E-admissibility is a union of maximality choice functions we have: 
Corollary 19. For any consistent decision tree T , it holds that 

backopt^(T) = normopt^(r). 
Proof. Immediate, from Proposition lA. 21 CorollarvfTSl and Theorem [TSl □ 



Further, from Theorem I A. 31 we have: 
Corollary 20. For any consistent decision tree T , 

normopt^^(T) = normopt^ (backopt^^ (T)). 
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A 


A 




P{-\B) 


P{-\B) 


X 


1 


1 


X 


1 


1 


Y 


1.5 


3.5 


Y 


2 


3 


Z 





4 


Z 


1 


3 





P 


P 


BX^ 


VBZ 


1 


2 


BY-\ 


VBZ 


1.5 


3 



Table 2. Gambles and their lower and upper previsions for Example [5TJ 



5.3. Interval Dominance. By Proposition lA.li interval dominance satisfies Properties [2] and [3l 
and it satisfies Property [U because AX = AY implies P{X\A) = PiY\A) and 'P{X\A) = 'P{Y\A). 
We now show that interval dominance fails Property [S) 

Example 21. Suppose A and B are events, and X , Y , and Z are the gambles given in Table\^ 
Let A4 contain all mass functions P such that A and B are independent, 1/4 < P{A) < 3/4, 
and P{B) = 1/2. Let P be the lower envelope of Ai. 

Lower and upper previsions of relevant gambles are given in Table 0' for example, 

P{BY + BZ) = max P{BY + BZ) = max P{B)P(Y\B) + P{B)P{Z\B) 
PeM PeM 

= - max(F(y) +P(Z)) = - max (1.5(1 - p) + 3.5p + 4p) = 3 

and similar for all other gambles. Clearly, Y interval dominates X conditional on B, however, 
BY + BZ does not interval dominate BX + BZ , violating Property\^ 

Even though interval dominance violates Property [SJ it can still be of use in backward induc- 
tion. It is easily shown that (see for instance Troffacs [34 ) 

opt>p|.(A'|A) Copt^^i (A-jA). 

By Theorem IA.3[ we therefore have: 

Corollary 22. For any consistent decision tree T, 

normopt>^| (T) = normopt>p| (backopt^^^ (T)) 

normopt^(r) = normopt^ (backopt^^^ (T)) 

It can also be shown that backopt-, {T) C normopt^ (T) for all T, so all strategies found by 
backward induction will be optimal with respect to opt^^ . 



5.4. F-maximin. P-maximin fails Theorem [I^A)] see for example Seidenfeld ^U] Sequential 



Example 1, pp. 75-77]. Since P-maximin is induced by an ordering, it satisfies Properties [2] 
and [3] by Proposition lA.il As for interval dominance, P-maximin satisfies Property [TJ Hence, P- 
maximin must fail Property [SJ Indeed, backward induction can fail in a particularly serious way: 
it can select a single gamble that is inferior to another normal form gamble. Hence, backward 
induction may not find any P-maximin gambles. 

6. The Oil Wildcatter Example 

We now illustrate our algorithm using the same example as Kikuti et al. [TH Fig. 2]. Fig. H] 
depicts the decision tree, with utiles in units of $10000. The subject must decide whether to drill 
for oil ((^2) or not (di). Drilling costs 7 and provides a return of 0, 12, or 27 depending on the 
richness of the site. The events Si to S3 represent the different yields, with Si being the least 
profitable and 5*3 the most. The subject may pay 1 to test the site before deciding whether to 
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Figure 4. Decision tree for the oil wildcatter. 

Tl 72 ^3 

0.183, 0.222 0.333, 0.363 0.444, 0.454 
Table 3. Unconditional lower and upper probabilities PiTi) and P{Ti) for oil example. 





Tl 


T2 


Ts 


Si 


0.547, 0.653 


0.222, 0.333 


0.111, 0.166 




0.222, 0.272 


0.363, 0.444 


0.333, 0.363 


S3 


0.125, 0.181 


0.250, 0.363 


0.471, 0.556 



Table 4. Conditional lower and upper probabilities P{Si\Ti) and P{Si\Ti) for 
oil example. 



drill; this gives one of three results Ti to T3, where Ti is the most pessimistic and T3 the most 
optimistic. 

Lower and upper probabilities are given for each Ti (Table [3]), and for each Si conditional 
on Ti (Table HJ. (Some intervals are tighter than those in Kikuti et al., since their values are 
incoherent — we corrected these by natural extension |35j §3.1].) 

By marginal extension [351 §6.7.2], the lower prevision of a gamble Z is then 

PiZ) = P{TiP{Z\Ti) + T2P{Z\T2) + T3P{Z\T3)). 

Let X = —7S'i + 55*2 + 205*3, and again let T* — stAr.(T). Since we will only be concerned with 
maximality, and normal form decisions in this problem are uniquely identified by their gambles. 
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(ii) 




(iii) [n\ 



opt({-l,X-l}|Ti) = {-l.X-l} 
opt({-l,X- 1}|T2) = {X -1} 
opt({~hX -l}\T:i) = {X -1} 



> \opt({0,X}) = {X} 

At opt({ri(-l) + T2iX - 1) + T3{X - l),Ti(X - 1) + T2{X - 1) + TsiX - 1)}) = {(Ta + T3)X -l,X-l} 



rfy. ^ {X} 
(iv) opt{{{T2 + n)X-l,X-hX}) = {X} 



Figure 5. Solving the oil wildcatter example by normal form backward induction. 



we can conveniently work with gambles in this example. Therefore, we use the following notation: 
opt = opt>p| back = gamb o backopt>p| norm = gambonormopt>p 

Fig. [S] depicts the process of backward induction described next. 

(i) Of course, back(-) at the final chance nodes simply reports the gamble: back(r]^P) = 
back(rif ) = back(Tif ) = {X - 1}, and back(Ti2^) = {X}. 

(ii) For Ti}, we must find P{{X - 1) - (-l)|Ti) and £(-1 - {X - l)\Ti). These lower 
previsions can be computed using Table |4] as follows: X will have lowest expected value 
when the worst outcome Si is most likely (probability 0.653) and the best outcome S3 
is least likely (probability 0.125), and so the probability of S2 is 0.222. So, PiX\Ti) = 
-7 X 0.653 + 5 X 0.222 + 20 x 0.125 = -0.961. Similarly, P(-X|Ti) = -1.151. Neither 
of these is positive, so back(r]^J) — {X -1,-1}. 

ForTiJ, P((X-l)-(-l)|T2) = 4.754, and therefore dominates di, so back(Ti^) = 
{X - 1}. Similarly, P((X - 1) - (-1)|T3) 10.073, so back(rij) = {X - 1}. 
For Tj^ji '^6 need to find P,{X — 0). By marginal extension we have 

EiX) = PiTiP{X\Ti) + T2P{X\T2) + nEiXlTs)) 

= p(-o.96iri + 4.754T2 + 10.073T3) 

= 0.222 x -0.961 + 0.334 x 4.754 + 0.444 x 10.073 = 5.846906. 

This is greater than zero, so back(r]^2) {^}- 

(iii) At T^^ there are two potentially optimal gambles: Ti{X - 1) + T2(X - 1) + T^iX - 
1) = X - 1 and Ti(-l) + T2{X - 1) + Ts^X - 1) (T2 + n)X - 1. We must find 
P((X - 1) - ((T2 + T3)X - 1)) = P(riX) and P(((r2 +T3)X - 1) - (X - 1)) = P(-TiX). 
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Using marginal extension, 

P{TiX) = PiTiP{X\Ti)) = P(-0.961Ti) = 0.222 x -0.961 = -0.213342 < 0, 
P{-XTi) = P(TiP(-X|ri)) = P(-1.151Ti) = 0.222 x -1.151 = -0.255522 < 0, 

so back(Tii) = {X-1, {T2 + T3)X - 1}. 
(iv) Finally, for T, we must consider {X,X - 1, (Ta + Ta)^ - 1}. It is clear that P{X - 
(X — 1)) = 1>0, soX — 1 can be eliminated. It is also clear that if a gamble does not 
dominate X — 1 then is also does not dominate X, so by our calculation at we know 
that X is maximal. We finally have 

EiX - ((T2 + T3)X - 1)) = EiTiX + 1) = EiTiX) + 1 = -0.213342 + 1 > 0, 

so back(T) = {X}. So, the optimal strategy is: do not test and just drill. 

We found a single maximal strategy. By Corollary [501 it is also the unique E-admissible 
strategy. (Our solution differs from Kikuti et al. [H]; since they do not detail their calculations, 
we could not identify why.) Of course, if the imprecision was larger, we would have found 
more, but it does show that non-trivial sequential problems can give unique solutions even when 
probabilities are imprecise. 

In this example, the usual normal form method requires comparing 10 gambles at once. By 
normal form backward induction, we only had to compare 2 gambles at once at each stage (except 
at the end, where we had 3), leading us much quicker to the solution: the computational benefit 
of normal form backward induction is obvious. 

7. Conclusion 

When solving sequential decision problems with limited knowledge, it may be impossible to 
assign a probability to every event. An alternative is to use a coherent lower prevision, or, 
equivalently, a closed convex set of probability mass functions. Under such a model there are 
several plausible generalizations of maximizing expected utility. 

Given any criterion, we considered two methods of solving decision trees: the usual normal 
form, where the subject applies the criterion to the lists all strategies, and normal form backward 
induction, adapted from [14]. If they coincide, backward induction helps efficiently solving trees. 
If they differ, doubt is cast on the criterion's suitability. 

In Theorem [T5] we identified when the two methods coincide. We then applied these results 
to the choice functions for coherent lower previsions. As was already known, F-maximin fails 
Property [51 Interval dominance fails the same condition. However, and perhaps surprisingly, 
maximality and E-admissibility satisfy all conditions, in the case where lower probabilities are 
non-zero. If any lower probabilities are zero, then Property [5] fails (unsurprisingly, as in this case 
it already fails with precise probabilities). 

When analysing choice functions, whether for lower previsions or not, usually Property [5] is 
most troublesome. Failing Property [T] would involve an unnatural form of conditioning, and 
path independence (Property!?]) is a very natural consistency condition that one usually wants 
to satisfy before even considering decision trees. 

We have not argued that a normal form operator, and normopt in particular, gives the best 
solution to a decision tree. A normal form solution requires a policy to be specified and adhered 
to. The subject does this policy only by her own resolution: she of course can change her 
policy upon reaching a decision node |29) . One might argue that a normal form solution is only 
acceptable when the subject cannot change her mind (for example, if she instructs, in advance, 
others to carry out the actions). 

Further, many choice functions cause normopt to have undesirable properties, even when they 
satisfy Properties (TJ [21 131 and[Sl For example, using maximality or ^-admissibility with a lower 
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prevision allows the subject to choose to pay for apparently irrelevant information instead of 
making an immediate decision |31j . and a gamble that is optimal in a subtree may become 
non-optimal in the full tree [51 \W\ [TT] . 

Moreover, normal form backward induction does not always help with computations. The 
need to store, at every stage, all optimal gambles, could be a burden. Secondly, if imprecision 
is large, causing only few gambles to be deleted, the set of optimal gambles at each stage will 
still eventually become too large. In such situations, a form of approximation may be necessary. 
Even so, we have shown that, perhaps surprisingly, the normal form can be solved exactly with 
backward induction, and when either trees or imprecision are not too large, the method will be 
computationally feasible. 
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Appendix A. Results for General Choice Functions 

This appendix details intermediate results required for the proofs in Section [31 Since the 
results are applicable for choice functions that are nothing to do with coherent lower previsions, 
and so may be useful for investigating other uncertainty models, we present them separately. 

Proposition A.l. For each non-empty event A, let be any strict partial order on A- 
consistent gambles. The choice function induced by these strict partial orders, that is, 

opty^.{X\A) ^{XeX: (vy e X)iY /a X)} 

satisfies Properties\^ and\^ 

Proof. By Lemma [HI it suffices to show that opt^|. is path independent. Let Xi, . . . , Xn be 
non-empty finite sets of A-consistent gambles, and let A be a non-empty event. Let X — [Jl^i Xi 
and Z = [Si^i opt^|.(Ai|A). We show must show that 

(3) opt^|.(A'|A) =opt^|.(Z|A). 

By definition, 

opt^i.(z|A) = {z^Z: (vr e Z){Y i-A z)}, 

and, observe that, \i X ^ X but X ^ Z, by transitivity of ^a and finiteness of X, there is a 
Y ^ Z such that Y >~a X. Therefore again by transitivity of >a^ for any Z ^ Z such that 
X -^A Z, we have Y Z. So, 

= {ZeZ: {^YeX){Y i-A Z)}, 

and once again, by definition oi Z, ii X ^ X but X ^ Z there is a F G A" such that Y ^a X, so 
we have 

= {XeX: iVYeX)iY iA X)} 
= opt^^.{X\A). 

□ 



Proposition A. 2. Let {optj : i G 1} be a family of choice functions. For any non-empty event 
A and any non-empty finite set of A- consistent gambles X, let 



opt{X\A) = \Jopt.,iX\A). 



(i) If each optj satisfies Property\^ then so does opt. 

(ii) If each opt^ satisfies Property\^ then so does opt. 

(Hi) If each optj satisfies Property\^ then so does opt. 

(iv) If each optj satisfies Properties [7J [^ and [31 then opt satisfies Property]^ 



Proof, (i) By definition of opt and by assumption, for any finite non-empty sets of gambles 
X and y such that y X and for any i e I, optj(A'|A) C opt(A'|A) C y, and therefore by 
Property[l opt,(A'|A) = opt,(y|A). Whence, 

opt(3^|A) = U opt.0\A) = U opi^{X\A) = oj>t{X\A). 
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By assumption, for any finite non-empty sets of gambles X and y such that y ^ X and 



for any i G I, opt,(y|A) D opt^{X\A) n y. Therefore, 

opt(y|A) = Uopt,(3^|A) D \J{ovt,{x\A)ny) 
= y n y opi,{x\A) = opi{x\A) n y. 



(iii) By assumption, for any non-empty finite set of gambles X ^ any gamble Z , any events A 



and B such that A n i? 7^ 0, and for any i G I, 

opt,{AX ®~AZ\B) C Aopti{X\Ar\B)®~AZ, 

whence 

opt(AA'® = |Jopti(AA'® 

C [j{Aopt^{X\Af^ B) ®AZ) 

= © A IJ opt,(A'| A n B) 

iei 

= ^Z ® Aopt{X\B). 

|(iv)| Let A and B be events such that Af] B ^ $ and Ar]B^%, Zhea, non-empty finite 
set of A n B-consistent gambles, and A:" be a non-empty finite set of A n B-consistent gambles 
such that there is {X, F} C X with AX ~ AY. Suppose that there is a Z G Z such that 
+ G opi{AX + ^Z\B). 

By definition of opt, there is a j such that AX + AZ G o^itjiAX + AZ\B). We show that 
both X and Y are in optj(A:'|A n B), and therefore are both in opt(A'|A n B). It follows from 
Properties [3] and \5\ that 

optj{AX + AZ\B) C Aoptj{X\Ar\ B) +Ao-pt^{Z\Ar\ B). 

Therefore, there is a ^ G optj (A'|A n B) with AV — AX. Finally, opt^ satisfies Property [1] and 
therefore both X and Y must be in opt^(A'|A n B). This establishes Property [1] for opt. 

□ 



The final result is the following: if optj^ satisfies the necessary properties, optj does not, 
but opti C optj, then we can use optj^ (backoptj (•)) to find nornioptj- This could be of interest 
in situations where opt2 is much more computationally efficient than optj^, and still eliminates 
enough gambles to be useful. 

Theorem A. 3. Let optj^ and optj be choice functions such that optj^ satisfies Properties]^ O 
O and\^ and for any non-empty event A and any non-empty finite set of A- consistent gambles 
X, 

opt^(X\A) C opt2(A:'|A). 
Then, for any consistent decision tree T , 



(4) 



normopti(r) = normopt^ (backopt^ (T)). 
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